Interpretable ML @ Avanade ITS

EmTech V-Team - Explainable AI

Nema Sobhani
IT Analytics, Avanade

Objective

Explainability/interpretability overview and demonstration of ML tools in the Azure stack used at Avanade.

I. Background

Do to our unique relationship with Microsoft, we have been given direct access to product owners for Microsoft's cutting edge machine learning, interpretability, and explainability tools including interpret-ml, interpret-community, and the azureml sdk (May Hu, Mehrnoosh Sameki, Ilya Matiach).

Why do we need Intepretable ML?

"The goal of science is to gain knowledge, but many problems are solved with big datasets and black box machine learning models. The model itself becomes the source of knowledge instead of the data. Interpretability makes it possible to extract this additional knowledge captured by the model."

- Christoph Molnar, ‘Interpretable Machine Learning’

Applications

  • Financial Services/Banking (fraud detection)
  • Marketing (user engagement)
  • Healthcare (individualized medicine and tracking)
  • Epidemiology (disease outbreak modeling)

Benefits

  • Validation of domain knowledge
  • Provides actionable evidence
  • Guides data practices and feedback

SHAP!

SHapley Additive exPlanations

Gives both globally and locally accurate and consistent feature importance values derived from individual contributions (drawn from Lloyd Shapley's work in combinatorial game theory).

Ideal for use with opaque models (boosted tree, kernel-based, NN, etc).

MSFT's Interpretability Offerings

Microsoft addressed the need for a unified API that makes it easy to get model explanation/feature importances based on various model types, built in to their machine learning platform.

  • From the SDK
    • pip install --upgrade azureml-sdk[explain,interpret,notebooks]
  • Only interpretability package
    • pip install interpret-community

Using the TabularExplainer object, the model type is detected and the appropriate SHAP explainer is selected to generate feature importances.

Original Model Invoked Explainer
Tree-based models SHAP TreeExplainer
Deep Neural Network models SHAP DeepExplainer
Linear models SHAP LinearExplainer
None of the above SHAP KernelExplainer

This package also supports a Mimic Explainer (Global Surrogate) and a Permutation Feature Importance Explainer (PFI), both of which are model-agnostic and will be covered later.

II. In Action

Dummy Example

Scikit-Learn Breast Cancer Binary Classification

Pip installations:

pip install numpy, pandas, sklearn, lightgbm, interpret-community[visualization]

In [1]:
import numpy as np
import pandas as pd
import sklearn.datasets as datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import lightgbm as lgbm
In [2]:
# Load and partition data
data = datasets.load_breast_cancer()

X = data.data
y = data.target # 0 = malignant, 1 = benign
feature_names = data.feature_names.tolist()
classes = data.target_names.tolist()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.25, random_state=42, stratify=y_train) 
In [3]:
# Model training
clf = lgbm.LGBMClassifier()

clf.fit(
    X=X_train,
    y=y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric='auc',
    feature_name=feature_names,
    verbose=25
)

y_pred = clf.predict(X_test)

print('\nConfusion Matrix: \n', confusion_matrix(y_test, y_pred))
print('\nClassification Report: \n', classification_report(y_test, y_pred))
[25]	valid_0's auc: 0.989519	valid_0's binary_logloss: 0.145683
[50]	valid_0's auc: 0.987226	valid_0's binary_logloss: 0.136426
[75]	valid_0's auc: 0.986243	valid_0's binary_logloss: 0.172291
[100]	valid_0's auc: 0.989846	valid_0's binary_logloss: 0.189166

Confusion Matrix: 
 [[40  2]
 [ 3 69]]

Classification Report: 
               precision    recall  f1-score   support

           0       0.93      0.95      0.94        42
           1       0.97      0.96      0.97        72

    accuracy                           0.96       114
   macro avg       0.95      0.96      0.95       114
weighted avg       0.96      0.96      0.96       114

In [1]:
# Feature Importance (SHAP)
from interpret_community import TabularExplainer

explainer = TabularExplainer(clf, initialization_examples=X_train, features=feature_names, classes=classes)
In [5]:
# Global Feature Importances
global_explanation = explainer.explain_global(X_train)
display(pd.DataFrame.from_dict(global_explanation.get_feature_importance_dict(), orient='index', columns=['SHAP Value']).head(10))
c:\users\nema.sobhani\appdata\local\continuum\anaconda3\envs\secops\lib\site-packages\shap\explainers\tree.py:194: UserWarning: LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray
  warnings.warn('LightGBM binary classifier with TreeExplainer shap values output has changed to a list of ndarray')
SHAP Value
worst area 2.341634
worst concave points 1.823587
worst perimeter 1.361179
worst texture 1.167765
area error 0.810253
mean concave points 0.637934
worst smoothness 0.334095
mean texture 0.311042
worst radius 0.200612
mean smoothness 0.200187
In [6]:
# Local Feature Importances (for predicting benign class)
shap = global_explanation.local_importance_values[1]

df_shap = pd.DataFrame(shap, columns=feature_names)
display(df_shap.head())
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst radius worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension
0 -0.021539 0.321744 -0.021727 -0.001728 -0.782009 0.013229 -0.071580 -0.124397 -0.043490 -0.034941 ... 0.188589 1.557899 1.157621 2.275592 -0.408454 -0.001883 0.029665 1.730362 0.047198 -0.109017
1 -0.126181 -0.371998 -0.073260 -0.001627 -0.214240 -0.027172 -0.291059 -0.304990 -0.199762 -0.004075 ... -0.041869 -2.656456 -2.995181 -0.203860 -0.740862 0.005230 -0.307423 -0.285139 -0.441779 -0.004399
2 0.003145 0.339770 -0.035467 -0.001743 -0.745712 0.039464 0.062067 0.488236 -0.047262 -0.037220 ... 0.035996 1.199373 1.354136 1.989123 -0.379708 0.062644 0.059645 1.566148 -0.050104 0.031548
3 0.067903 0.657175 0.012231 0.003307 0.123375 -0.365281 0.033967 0.533232 0.012120 0.062579 ... 0.079415 2.114178 0.219407 1.221378 0.153687 0.045893 0.045342 1.441597 0.435639 -0.054577
4 -0.055729 -0.170627 -0.046742 -0.002422 -0.724170 0.009585 -0.229870 -0.462293 -0.172071 -0.042171 ... -0.077372 -2.193885 1.002135 2.284974 -0.459759 -0.027261 -0.353700 -5.373368 -0.290265 -0.026400

5 rows × 30 columns

In [7]:
# Visualization
from interpret_community.widget import ExplanationDashboard

ExplanationDashboard(global_explanation, clf, datasetX=X_train, trueY=y_train)
Interpret Dashboard Open in new tab
Out[7]:
<interpret_community.widget.explanation_dashboard.ExplanationDashboard at 0x2d90d832e80>

Avanade Interpretability VM (Demo)

III. Other Approaches

Different Methods

Local Interpretable Model-agnostic Explanations (LIME)

Explainable surrogate models are trained on the predictions of the opaque model, therefore allowing local, interpretable explanations. No guarantee to be globally relevant.

Global Surrogates Models (Mimic Explainers)

Same as LIME, but applied to global scale. Must be an interpretable model (tree or linear) that trains on the original data with the addition of the predicted label of the opaque model.

Permutation Feature Importance (PFI)

Shuffles dataset, feature by feature and measures effect on performance metric. Larger changes are attributable as more important features.

Diverse Counterfactual Explanations (DiCE)

Uses feature perturbations to give actionable outcomes on requirements to shift between classes.

ie. If credit score was > 700, user X would likely move into the "Loan Approved" classification.

Competing offerings

AWS Sagemaker Debugger just recently started utilizing the shap package microsoft has integrated into azure ml, contrasting the maturity of Microsoft's early investment in explainable AI.

Oracle's "Skater", is a python package that supports local interpretation using LIME and global interpretation using scalable bayesian rule lists and tree surrogates. The documentation is rather sparse and there doesn't seem to be any momentum to expand to other methods.

Scikit-learn's built-in feature importances provide some value for simple models, but lack the depth and versatility of msft's interpret-community.

Explain like I’m 5 (ELI5) uses LIME and PFI on opaque models and offers specialized support for text classifiers. Does not offer any visual utilities or shap.

The webapp, ml-interpret, is an online-only platform where a dataset may be uploaded and a model selected to explain outcomes of opaque models, but there is practically no customizability and the user is size-restricted.